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Abstract 

Background: Honning endonucleases (HEases) are a large and diverse group of site-specific DNAases. They reside 
within self-splicing introns and inteins, and promote their horizontal dissemination. In recent years, HEases have 
been the focus of extensive research due to their promising potential use in gene targeting procedures for the 
treatment of genetic diseases and for the genetic engineering of crop, animal models and cell lines. 

Results: Using mathematical analysis and computational modeling, we present here a novel account for the 
evolution and population dynamics of HEase genes (HEGs). We describe HEGs as paradoxical selfish elements 
whose long-term persistence in a single population relies on low transmission rates and a positive correlation 
between transmission efficiency and toxicity. 

Conclusion: Plausible conditions allow HEGs to sustain at high frequency through long evolutionary periods, with 
the endonuclease frequency being either at equilibrium or periodically oscillating. The predictions of our model 
may prove important not only for evolutionary theory but also for gene therapy and bio-engineering applications 
of HEases. 



Background 

Self-splicing introns and inteins are genetic elements that 
are transcribed as part of genes; they remove themselves 
from the transcript before translation (introns) or from 
the translated protein (inteins) [1,2]. Homing endonu- 
clease genes (HEGs) are molecular parasites that are fre- 
quently encoded as open reading frames on class I introns, 
or as part of inteins [3]. The recognition site of these 
HEases is so large that they cleave a genome only in one 
or very few places. This has prompted attempts to use 
HEases in gene therapy and in the genetic engineering of 
large and complex genomes [4-6]. HEases have been 
designed to target disease-associated genes, such as XPC 
[7] and RAGl [8], to induce targeted integration in cell 
lines [9,10], and for targeted mutagenesis in crop [11]. In 
the natural host, a HEase promotes the horizontal propa- 
gation of its respective intron/intein into an intron-less or 
intein-less allele by cleaving the vacant allele to induce 
double strand break repair by homologous recombination 
(HR). The finding that the HEases encoded in inteins and 
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introns belong to different endonuclease families [12,13] 
reveals that the unions between HEGs and self-splicing 
elements occurred several times, possibly because both 
HEases and introns/inteins evolved to target similar sites 
[14]. This union between self-splicing elements and endo- 
nucleases created molecular parasites, which have to be 
regarded as distinct evolutionary units, and whose fate is 
intertwined but separate from the fate of the host protein 
and the host organism [15]. The life cycle of inteins/ 
introns with HEGs has been described by the homing 
cycle [16,17]. The homing cycle addresses the following 
dilemma: if the HEase is very successful, after a short 
phase of super-Mendelian inheritance, the HEase targets 
in all members of the population will be occupied by a 
HEG; however, once the HEase has ran out of substrate, 
its activity is no longer under purifying selection; the HEG 
begins a decay process, resulting in self-splicing elements 
that lost the ability to home. At the end of this decay pro- 
cess are class I introns that do not encode HEases, and 
mini inteins, i.e., inteins without a HEase domain. Once 
the dysfunctional HEG is fixed in the population (through 
drift or by the selective disadvantage conveyed by the 
active HEase), the splicing element can be deleted, thus 
returning the homing cycle to its beginning. In case of 
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introns, the precise deletion may occur via a processed 
mRNA intermediate [18,19]. Inteins can only be elimi- 
nated by random precise deletion, which may be very rare. 
A precise deletion likely is required to retain a functional 
host protein, because inteins and class I introns are found 
in the most conserved parts of conserved proteins, and an 
imprecise deletion likely would yield a non-functional 
protein [20]. 

Support and discrepancies encountered by the homing 
cycle model 

The homing cycle is based on fixation within and transfer 
between populations [16,17,21]; however, the homing 
cycle usually was evaluated on the basis of transfer 
between species. The transfer of HEG containing inteins 
and introns across species boundaries is revealed through 
the high sequence similarity of these molecular parasites 
found in distantly related organisms [17,22,23]. In many 
instances, where closely related organisms were sampled, 
the phylogeny of the HEG was found not to agree with 
the phylogeny of the host [15,16,24-26]. This confirms 
the inter-species transfer of HEG containing elements, in 
agreement with the homing cycle model as formulated 
for species. 

However, some recent findings suggest that HEGs can 
persist in a single species over very long periods of time. 
For example, molecular parasites with functional HEGs 
were found in the apparently asexual amoeba Naegleria 
[27], and the PRP8 intein with HEG persisted over long 
periods in euascomycetes ([21] provide an estimate of over 
500 million years) under purif)^ing selection [28] and with- 
out any evidence for their horizontal transfer. Gogarten 
and Hilario [21] discuss three possible explanations for the 
long term persistence of HEG containing molecular para- 
sites: (a) the molecular parasites may have acquired a func- 
tion that provides an adaptive advantage to the host, (b) 
the homing cycle could operate on a subpopulation level 
in a spatially distributed inhomogeneous population, (c) 
the super-Mendelian inheritance may be balanced by 
selective disadvantages that the HEase and the self- splicing 
element confer to the host organism. For the latter case 
Yahara et al, [29] showed that stable equilibria can exist, 
in which all three types of homing sites coexist in a well- 
mixed population but only under the highly unlikely 
assumption that the reduction in fitness resulting from 
carrying a dysfunctional HEase is higher than the reduc- 
tion in fitness due to the functional HEase. For more rea- 
listic conditions, when the cost of the functional HEase is 
higher than the cost of the dysfunctional HEase, only peri- 
odic solutions were reported by [29] that are similar to a 
typical Lotka-Volterra type predator-prey system [30] with 
the trajectory following the phases of the homing cycle 
(invasion, fixation, decay, complete loss; however, the fixa- 
tion phase does not go to completion, and reinvasion 



occurs from within the population). Under these condi- 
tions and for the parameter choices analyzed by Yahara 
et al. [29] the average frequency of alleles with functional 
HEG is low, often verging on extinction during each cycle 
of the trajectory. This finding is in conflict with the obser- 
vation that functional HEGs are often sampled from 
natural populations, suggesting high frequency. 

We therefore decided to explore more thoroughly the 
possibility of long-term persistence of functional HEGs 
at high frequency. Using analytical and simulation 
approaches, we study the behavior of an iterative model 
that simulates allele frequencies of HEGs in randomly 
interbreeding populations. We report on conditions that 
allow for the long-term persistence of functional HEGs 
at high frequency, and we explore the parameter choices 
under which the persistence of HEGs at high frequency 
is dynamically stable and evolutionarily stable. We find 
that, for the range of parameters analyzed, a HEG can 
persist at high frequency if the effective homing fre- 
quency per generation is low, which can be accom- 
plished either through low homing efficiency, or 
through low frequency of sexual recombination. We 
also find that an internal equilibrium can be reached 
with the HEG being at a high frequency, if the self-spli- 
cing sequence in which it resides is to some extent 
toxic. Finally, we find that for these solutions to be evo- 
lutionarily stable, in the sense of [31], requires a positive 
correlation between the homing frequency and the fit- 
ness cost of the HEG, an assumption that we discuss 
and regard as biologically plausible. 

We note that the relative fitness of the three alleles in 
competition represents an incarnation of the rock-paper- 
scissors game (Figure 1). In pair-wise competition the 
three alleles form a circular inequality. Given previous 
explorations of similar non-transitive interaction systems 
(e.g., [32]), the finding of stable and evolutionarily stable 
equilibria in a spatially homogenous implementation of 
the rock-paper-scissors game is surprising and may be 
relevant for the field of game theory. 

Results 

Model 

Our model aims to describe the evolutionary dynamics of 
HEGs in populations of organisms engaging in genetic 
exchange. We assume that mating and genetic exchange 
are random, that generations are discrete and non-over- 
lapping and that selection acts at the haploid stage. We 
assume the population is large enough to neglect the effect 
of genetic drift, and allow no gene flow into or out of the 
population. All organisms in the population are considered 
isogenic apart from a single gene with three different 
alleles: X - an allele harboring no insertion sequence; Y - 
an allele harboring an intron or an intein but no functional 
HEG; and Z - an allele harboring an intron or an intein 



Barzel et al. BMC Evolutionary Biology 201 1, 1 1:324 
http://www.biomedcentral.eom/1 471 -21 48/1 1 /324 



Page 3 of 1 3 



X 



Alleles with empty 
Target Site 



Y 




(5> 



\ \ \ 



^ '<?i 



z 



Alleles harboring a 
dysfunctional 

Homing 
Endonuclease 



Alleles invaded by a 
functional Homing 
Endonuclease 



(II) Y > Z : Carriers of the Y-allele 
are more fit than carriers of the Z 
allele, and the presence of a 
dysfunctional homing 
endonuclease provides immunity 
to invasion by Z 

Figure 1 A Molecular Rock-Paper-Scissors game: The relationships between carriers of the X, Y and Z alleles are descried through 
three inequalities (I, II, III). (I) Z outcompetes X because of the activity of the homing endonuclease during sex (with probability hm), (II) Y 
beats Z, and (III) X beats Y because of the decrease in relative fitness associated with the homing ability (parameter t) and with the splicing 
element (parameter s). The central arrows depict the serial succession of the state of a particular homing endonuclease target site: The empty 
target site (X) is converted to Z through invasion by the homing endonuclease; the homing endonuclease in a Z allele can undergo mutation to 
become a Y allele with a dysfunctional homing endonuclease, which provides immunity to invasion by Z. Finally, a precise deletion of the 
intein/intron can restore the empty target site. 



encoding for a functional HE. The frequencies of the X, Y 
and Z alleles are designated x, y and z respectively, with 
X + y + z = 1. 

We assume that a homing endonuclease can degener- 
ate at a rate u per generation - converting a Z allele to a 
Y allele (Additional file 1, Table SI also showing respec- 
tive notation in [29] to allow comparison of the two 
models). The reciprocal conversion is neglected. Unlike 
the endonuclease activity, a complete degeneration of the 
splicing function is detrimental for the organism because 
it impairs the function of the host gene. Modeling the 
inactivation of the HEG by a single rate constant is a sim- 
plification; more subtle changes in either splicing or 
cleaving efficiency are considered in detail below. The 
model allows both the Z and the Y alleles to undergo 



precise elimination at a rate v per generation, giving rise 
to an X allele. The X allele can in turn be converted to a 
Z allele only through homing. We assume that non- 
HEase induced gene conversion is globally reciprocal and 
hence negligible. For the same reason, we allow no X to 
Y conversion. In our model, v is taken to be much smal- 
ler than u. Precise intron erasure can occur by gene con- 
version between a genomic Y or Z allele and the cDNA 
of its transcript [18,19], or through gene duplication 
through an mRNA intermediate. Complete intein aboli- 
tion may be rarer because it relies on serendipitous pre- 
cise deletion. The effect of homing on the Z- and X-allele 
frequencies is determined by two parameters: m - the 
rate of mating or horizontal gene transfer per generation 
and h - the probability of homing in a diploid carrying an 
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X allele and a Z allele (or the rate of homing in a merodi- 
ploid carrying a genomic X allele and an exogenous Z 
allele). We note that the factors h and m do not occur 
independently of each other in our equations, and that 
the product hm is a measure of the flux from Z to X 
being proportional to x''z. Finally, we assume that the 
insertion sequences may have an effect on the fitness of 
the organism. We define the relative fitness of an organ- 
ism carrying an X allele to be 1. An organism carrying a 
Y allele has fitness l-s, and an organism with a Z allele 
has fitness 1-s-t, with s and t being non-negative and 
small. 

While homing endonucleases are usually considered to 
be molecular parasites, they may at some point in evolu- 
tion become beneficial to their host[21,33,34], which 
would translate to t < 0 in our nomenclature. Clearly, an 
allele that increases fitness of its carriers can be driven 
into fixation. A striking example is the HO endonuclease 
from Saccharomyces cerevisiae which is an evolutionary 
derivative of a HEG that has acquired an essential role in 
the yeast mating type switch[35]. During its domestication 
process, HO has lost its ability for super-Mendelian inheri- 
tance. In particular, it is not coded within an intron or an 
active intein and is not flanked by homologies to its target 
site. Here we are interested in the evolutionary persistence 
of homing endonuclease parasites as such. We are inter- 
ested in the persistence of the homing ability and its link- 
age to intron/intein mobility. We therefore chose to put 
aside the case of t < 0 in order to focus on the tri-partite 
dynamic between two symbiotic parasites (intron and 
HEG) and their host that takes place for t > 0 values. We 
define t to be the toxicity of the homing activity of the 
endonuclease, which is assumed to have a non-negative 
value, as is also assumed for the fitness cost of the inser- 
tion sequence (s). 

We study the above model analytically and numerically 
to find and describe the parameter sets allowing the evolu- 
tionary persistence of homing endonucleases (Z alleles). 
We derived a set of three recursive functions describing 
the X, Y and Z allele frequencies at the n+l* generation 
given their frequencies at the n^'^ generation for a given 
parameter set {u, v, h, m, s, t} (see derivation in additional 
file 2, Table S2). The symbol W stands for the average fit- 
ness of the population W = x + (l-s)y + (l-s-t)z: 



1 



(3) 



1 - 



[Xn +v(Wn -Xn)] 
Wn 

hm(l -v)(l -u)(l - s-t)Zn 



(l-v)[(l-s)y, + u(l-s-t)zn] 

Wn 



(1) 



W = x+(l-s)y+(l-s-t)z 

[Xn + v(Wn - Xn)] 



Xn+1 



Yn+l 



Wn 

hm(l -v)(l -u)(l - s-t)Zn 
(l_v) [(i-s)y^ + u(l-s-t)Zn] 



Wn 



Zn+i = 1 - Xn+i - Yn+i 

This interplay between the alleles allows the HEG to 
be maintained in a population, if the frequency of the 
three alleles reaches either equilibrium (e.g. Figure 2a) 
or periodic oscillations (e.g. Figure 2b). Notably, when 
equilibrium is reached, it is often reached as the damp- 
ing of oscillations, and in finite populations it is sensi- 
tive to stochastic perturbations to an extent discussed 
below. 

In contrast to previous reports [29], we find a plausi- 
ble range of parameters that allows all three alleles to be 
maintained at an internal equilibrium with the HEG (Z 
allele) being at high frequency (e.g. Figure 2A). The 



(2) 





llmmmmmmmm- 



Figure 2 Examples of homing endonuclease persistence at a 
high frequency. A) Equilibrium witli a liigli frequency obtained 
using tine parameters: s = 0.01, t = 0.001, lim = 0.0123, u = 10"^ and 
V = 10"^. B) A periodic solution obtained using the parameters: s = 
0.027, t = 0.0027, hm = 0.0969, u = 10"^ and v = 10"^ (small top 
panel: same for 10^ generations). 
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conditions and inter-dependencies between the para- 
meters that define this range can be discerned from an 
analysis of the equilibrium equations: Xn+i = Xn and, 
Yn+i = Yn (and therefore also Zn+i = Zn). The analysis 
leads to some non-trivial, perhaps even surprising, con- 
clusions demonstrating the uniqueness of the evolution- 
ary dynamics of homing endonucleases. 

Analytical bound 1 

For an internal equilibrium to be maintained with the 
Z allele being at high frequency, both the Y and the Z 
alleles must have a reduced fitness with respect to the X 
allele. In particular, because the parameter s must be 
larger than or equal to a non-negligible positive number: 

Yeq 



u(l -t)(l -v) - 



(l-v) 



/ eq 
^q 



In other words, the insertion 



sequence must be toxic in order for the HEG to reach 
high equilibrium frequency. 



Mathematical formulation 

u(l-t)(l 

(l-v) 



X ^eq 
■ V) - V— 

^q 



/eq 
^q 



+ U 



(4) 



(see additional file 3, Proof SI). 



In particular, Zgq > y^, 



s > 



u(l-t)(l-v) 



' ~ (1 -v)(l +u) 
We note that the HEG degeneration rate (u) is expected 
to be orders of magnitude larger than the introns/intein 
deletion. 

Analytical bound 2 

For an internal equilibrium to be maintained with the Z 
allele being at a high frequency,, the reduction in fitness 
associated with the endonuclease activity of the Z allele 
(t) has to be smaller than the reduction in fitness asso- 
ciated with the splicing element (s) or of similar magni- 
tude to it. In particular, the parameter t must be smaller 

v+ (l - Zeq) s 
than or equal to the positive number: ^ 

Zeq 

That is, the endonuclease must be precise to the extent 
that its activity is not significantly more toxic than the 
insertion sequence per se. 

Mathematical formulation 

V+ (l -Zeq)s 



Zeq > 0 ^ t < 



(5) 



^eq 



(see additional file 4, Proof S2). 



E.g. Zeq > 0.5 ^ t < s + 2v. 
Analytical bound 3 

For an internal equilibrium to be maintained with the 
Z allele being at high frequency, the transmission rate 
of the HEG must be low (albeit, above a positive 
threshold). In particular, the product hm must be 
smaller than or equal to the small number: 



t + 



,. We note that selection of 



^eq 



(l-v)(l-u)(l-s-t) 

various Z alleles in the presence of empty target sites 
(i.e. high x) would favor high homing ability, at least 
until specificity is optimal and any further increase in 
homing ability is accompanied by increased toxicity 
(see below). Therefore, we would expect to find hom- 
ing endonucleases persisting at high equilibrium levels 
only in organisms characterized by low rates of genetic 
exchange (low mating and or low recombination rates). 



Mathematical formulation 

Zeq > 0 ^ 



(6) 



(s + t) 



(l-u)(l-v) 



hm 



t + 



hm < 



^eq 



(l-v)(l-u)(l-s-t) 



(See additional file 5, Proof S3). 
In particular, z, 



eq 



(s + t) 



(1 - u) (1 - V) 



"2^ 
< hm 



t + 



hm < 



(l-v)(l-u)(l-s-t) 



Comparison with the Yahara model 

We and Yahara et al. [29] have used similar models with 
similar goals and reached strikingly different conclu- 
sions. Yahara et al. found stable internal equilibria to be 
possible only under the condition that the decrease in 
fitness resulting from carrying a dysfunctional HEase is 
higher than the decrease in fitness due to the functional 
HEase. Otherwise, they have identified only periodic 
solutions. In contrast, we found a significant and biolo- 
gically plausible range of parameters allowing for stable 
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equilibria to be maintained, all under the assumption 
that functional HEase decreases fitness more than dys- 
functional HEase (namely, 0 < t). When we take a para- 
meter set leading to stable equilibrium in our model 
and input it into the Yahara model we do find a stable 
equilibrium there as well (e.g. additional file 6, Figure 
SI). We assume that Yahara et al. were unable to detect 
the parameter range allowing for equilibrium because 
they only included mutation in their calculations when 
looking for equilibrium with z = 1. When searching for 
internal equilibria, Yahara et al. acted under the explicit 
assumption that mutation was a destabilizing factor, 
therefore neglecting it altogether. In contrast, we have 
shown that a positive mutation rate can facilitate the 
establishment of a stable equilibrium. Moreover, the Z 
equilibrium frequency in Additional file 6, Figure SI is 
low (z ~ 0.092) because the parameter choice adheres to 
the Yahara restriction (1-s)^ = 1-s-t (/3^ = a in their 
notation). When we allow s and t to be independent, 
stable internal equilibria with high Z frequency are read- 
ily detected (See Additional file 7, Figure S2). In sum- 
mary, the difference between the conclusions made by 
Yahara and us does not result from model differences 
(see Additional file 1, Table SI), but rather from the fact 
that Yahara et al. did not explore the parameter space 
exhaustively. 

Computer simulation 

To further elucidate the conditions facilitating the preser- 
vation of the Z allele we carried out a systematic sampling 
of the parameter space. For each parameter set we used a 
deterministic computer simulation starting with an initial 
population consisting almost entirely of X alleles, a minis- 
cule proportion of Z alleles (10'^) and no Y alleles. These 
starting conditions best model actual populations being 
invaded by HEGs. We stopped the simulation once equili- 
brium of allele frequencies has been reached. Equilibrium 
was assumed if the last 10,000 generations had a standard 
deviation, which was no more than 10'^. Additionally, if 
frequency of the HEG went under 0.001 after having been 
over 0.01, the frequency of z was given the value zero (to 
avoid cases which are likely to be stochastically eliminated 
in a finite population). In case no stable equilibrium was 
reached, we plot the average frequency of Z in the last 10^ 
of 10^ generations. Our results are presented as phase dia- 
grams depicting the evolutionary fate of the Z allele on 
color scale as a function of its homing ability and toxicity 
(e.g. Figure 3). For each phase diagram we sampled 10^ 
different parameter sets by crossing 100 different choices 
for the value of the product hm with 100 different choices 
for the value of s. Different relations between s and t were 
selected for each phase diagram as indicated, while u = 10' 
^ and V = 10"^ were taken to be constant and equal in all 
phase diagrams. 



The sub-space of parameter choices leading to stable 
allele frequency equilibria is bordered in our phase dia- 
grams by a solid black line (Figure 3). Importantly, the 
phase diagrams demonstrate that analytical bounds 1-3, 
allowing high equilibrium frequency of the Z allele, are 
tight, i.e. for almost any parameter set obeying these 
conditions, the homing endonuclease will indeed reach a 
high equilibrium frequency. Moreover, the conditions 
allowing the Z allele to reach a high average frequency 
in a limit cycle are a natural extension of the conditions 
allowing it to have high equilibrium frequency: 

1. Analytical bound 1 states that the insertion sequence 
must be detrimental (s > 0) for an internal equilibrium to 
be reached where the Z allele has a high frequency. Our 
phase diagrams illustrate that s > 0 is furthermore a prere- 
quisite for the Z allele to reach a high average frequency. 
This condition is biologically plausible. Even if the intein 
or intron has optimized its splicing activity, the insertion 
sequence still bears with it the burden of replicating, tran- 
scribing and in the case of luteins also translating, hun- 
dreds of superfluous nucleotides. 

2. Analytical bound 2 states that the endonuclease 
activity cannot be significantly more toxic than the inser- 
tion sequence per se (t < ~ s) for an internal equilibrium 
to be reached where the Z allele has a high frequency. 
Our phase diagrams demonstrate a clear trend for higher 
average z values to be associated with a lower t/s ratio 
(compare Figure 3a, b and 3c). 

3. Analytical bound 3 states that an internal equili- 
brium with a high frequency of z is possible only when 
the product hm is low, but not too low, so that hm > ~ s 
+t. For this reason, the values of hm are presented in the 
phase diagrams as products of s+t. In congruence with 
analytical bound 3, our phase diagrams demonstrate that 
high average frequencies of the Z allele are possible only 
when s + t < ~hm < 2 (s+t). Low rates of genetic exchange 
are therefore necessary for long term HEG persistence. 

Model predictions for finite populations 

Our analysis so far assumed an infinite population. For 
finite populations, genetic drift may delay, restrict or pre- 
clude the damping of the initial oscillatory dynamic (e.g. 
Figure 4 and Additional file 8, Figure S3). When the abso- 
lute values of the eigenvalues of the Jacobian are smaller 
than 1 but the eigenvalues do have imaginary parts (as is 
typically the case in our stable area), a deterministic sys- 
tem approaches equilibrium through oscillations (e.g. Fig- 
ure 2), and a finite population would exhibit some 
oscillations around equilibrium. When the absolute value 
of the complex eigenvalue is very close to 1, the oscilla- 
tions decay slowly. Then the question is the rate of oscilla- 
tions decay versus the strength of the perturbations 
resulting from random sampling. In particular, when the 
population is relatively small, so that drift results in 
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Figure 3 Phase diagrams depicting the evolutionary fate of the Z allele, in color scale (Right), as a function of its homing ability and 
toxicity. The colors correspond to the average Z frequency in the last 10^ of 10^ generations, u = 10"^ and v = 10"^ for all subfigures. A is for t 
= 0, B t = 0.1 s, and C t = s. When our deterministic simulations reached equilibrium we tested the asymptotic stability of that equilibrium by 
evaluating the eigenvalues of the Jacobian matrix at that point. The black line separates parameter values allowing for stable equilibria (bottom) 
from parameter values leading to no equilibria or non stable equilibria. 

V J 



significant perturbations, the system does not converge to 
equilibrium and sustains oscillations (e.g. Additional file 8, 
Figure S3 A). In larger populations convergence to near 
equilibrium does happen (e.g. Figure 4 and Additional file 
8, Figure S3 B-D). This finding is in congruence with the 
abundance of HEGs in unicellular organisms, which have 
a larger characteristic population size (> 10^ [36-39]). 

Evolutionary stability 

The dynamic stability of an allele frequency equilibrium is 
not sufficient to guarantee the long term persistence of a 
HEG in a real population [31]. This is because alleles with 
different homing ability and different toxicity do arise by 
mutation, threatening to disturb any status quo. Therefore, 
we set out to explore the stability of equilibria with high z 
values with respect to invasions by alleles with different 
attributes in order to discern the conditions rendering a 
homing endonuclease evolutionarily stable. 

1. Invasions by Z' alleles with different homing ability (h') 
and/or different homing related toxicity (f) 

All other parameters being equal, any mutation in a Z 
allele giving rise to a Z' allele having a higher specificity 
should prevail (i.e., any Z' allele with (h' > h and t' < t) OR 
(h' > h and t' < t))). A particular Z' may have a higher 
equilibrium value or a higher oscillation average than did 
Z, but if such invasions of Z alleles with higher specificity 
are repeated, eventually they might lead to the fixation of 
the HEG allele and to its subsequent degeneration. This 
destructive scenario is avoided, if we assume that when 
maximal specificity is obtained then the homing ability is 
at its biological upper bound and the toxicity is at its bio- 
logical lower bound. Evolutionary stability of Z is made 
possible for a broader range of h and t values under the 
biologically plausible assumption that once a threshold 
specificity is obtained, (A) any possible rise in potency "h" 
is accompanied by a rise in toxicity "t"; and vice versa: (B) 
any possible decrease in t is accompanied by a decrease in 
h (Figure 5 A and 5B). We note that conditions A) and B) 
may conflict for moderate values of h and t. Therefore, we 
expect to find internal equilibria where the Z allele has a 



high frequency, if the toxicity or homing ability of the 
HEG is near the respective biological bounds. 

2. Invasions by Y' or Z' alleles with lower intron/intein 
related toxicity (s') 

A mutation in a Y allele can give rise to a Y' allele with a 
lower toxicity s' < s. However Y' may not invade because Y 
is constantly regenerated by HEG degeneration. On the 
other hand, if the mutation reducing s to s' takes place in 
a Z allele (a mutation in the splicing motif of the HEG 
containing intron/intein) then the resulting Z' allele and 
its degenerated form Y' are expected to win over Z and Y 
respectively. As before, Z' may have a higher equilibrium 
value or higher oscillation average than did Z, but if such 
invasions are allowed to repeat till s = 0 they will even- 
tually lead to the fixation of a HEG allele and to its subse- 
quent degeneration. Therefore, allele equilibrium with a 
high Z frequency is evolutionarily stable only if the s para- 
meter is at its physical lower bound, and only if the order 
of magnitude of this lower bound is not in itself lower 
than the order of magnitude of u. Again, this is biologically 
plausible. The insertion sequence still bears with it the 
burden of replicating, transcribing and, in the case of 
luteins, also translating, hundreds of superfluous base- 
pairs. 

3. Invasions by an X allele with lower homing 
susceptibility 

From the host perspective, evading the HEG parasite 
might carry a fitness benefit. A mutation in an X allele 
occurring at the target site of the endonuclease can give 
rise to an X' allele with respect to which the Z allele has 
a reduced homing ability (h'). Under some circumstances, 
an invasion of this type may even increase the ultimate 
frequency of Z by preventing its fixation. However, if 
such invasions are repeated they will eventually lead to 

s + t 

the product hm being lower than — , pre- 

(1 - u) (1 - v) 

eluding the establishment of any equilibrium with z > 0 

(analytical bound 3). To secure the long term stability of 

the HEG, any reduction in the susceptibility of the empty 
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A N=10^ B N=10* 




Figure 4 Example of model predictions for finite populations. Random noise was added to our deterministic model as expected from 
binomial sampling in a population of size N = 10^10^ or 10^ (A-C, respectively). The parameters used: s = 0.01, t = 0.001, hm = 0.0123, u = 10"^ 
and V = 10"^, are identical to those in Figure 2a and result in a stable equilibrium in the deterministic model. 



allele for homing must be accompanied by a large 
enough reduction in the fitness of the empty allele (h' < h 
^ fitness of X = 1-r' where r' > 0) (Figure 6). This idea 
has valid biological support: Type I introns and inteins in 
general, and with HEGs in particular, are almost always 
found inserted in conserved sites of essential genes [40]. 
As a result of this pivotal location, if the host tries to 
evade the homing endonuclease parasite by changing 
nucleotides at the target site, then the host has to pay 
the cost associated with altering a highly conserved 
motif. Recently, in relation to HEG usage in gene therapy, 
we showed that HEG target conservation allows 
some HEases to specifically cleave human loci of medical 
interest [5]. 

Discussion 

Support for the model 

At least in some HEG families, HEGs have persisted over 
long evolutionary periods (up to 500 million years [21]) 
under purifying selection and without any evidence for 
their horizontal transfer [27,28] . This is in conflict with the 
idea of frequent HEG elimination and re-invasion [16,17] 
and also in conflict with the notion of oscillating HEG fre- 
quency nearing extinction at every cycle [29]. It is instead 



suggestive of a balancing evolutionary mechanism, keeping 
the HEG-containing allele at high frequency although it is 
usually associated with no fitness advantage for the host. 
Direct evidence for our rock-paper-scissors model is 
scarce. However, some indirect data imply that our predic- 
tions are valid. Vos and Didelot [41] compared recombina- 
tion rates (Corresponding roughly to the hm values in our 
model) in varied bacterial and archaeal species. All marine 
bacteria surveyed (M aeruginosa, M, chthonoplastes, P. shi- 
gelloides, P. ubique, V. parahaemolyticus, V. vulnificus) 
were characterized by high recombination rates with the 
exception of Microcoleus chtonoplastes. M. chtonoplastes is 
also the only one of these species known to encode a HEG 
(in the RIRl-2 intein). Amongst Eukaryotes, yeasts are par- 
ticularly abundant with HEGs. Several studies indicate that 
sex is yeast is indeed rare (as low as 1 out-crossing event in 
50,000 generations in Saccharomyces cerevisiae and Sac- 
charomyces paradoxus [42] and as low as 1 sexual cycle in 
1000 generations in S, paradoxus [37]) Finally, HEGs are 
widespread in mitochondria and chloroplasts. Both of 
these endosymbionts are notorious for their low rates of 
genetic exchange [43,44]. 

Our analysis of finite populations exemplified how 
genetic drift can sometime prevent the stabilization of 
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Figure 5 Phase diagrams depicting the evolutionary fate of the Z allele, in color scale (Right), after an invasion by a Z' allele, with a 
different homing ability h' and toxicity X'. A. h' < h and t' < t. B. h' > h and t' > t. Expectedly, for h' < h and t' > t, Z' will never invade, while for 
h' > h and t' < t, Z' will always invade (results not shown). The colors correspond to the average Z frequency in the last 10^ of 10^ generations, u = 
10"^, V = 10"^ and the rest of the parameters were chosen for a high value of z (0.733) in a stable equilibrium: s = 10"^, t = 10"^ hm = 0.0123. 



allele-frequency oscillations, endangering the long term 
persistence of the HEG. The phenomenon is especially 
pronounce in populations of smaller size, which may in 
part explain the prevalence of HEG in unicellular organ- 
isms characterized by large population sizes [36-39]. 



Our prediction that some toxicity of the insertion 
sequence is inevitable is supported by the positive corre- 
lation found between gene compactness and gene 
expression [45-47]. We further predict that for long 
term HEG persistence at high levels the toxicity of the 
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Figure 6 A phase diagram depicting the evolutionary fate of the Z allele, in color scale (Right), after an invasion by X' allele, a host 
sequence to which homing is more difficult, as a function of the parameters of X': the reduced homing ability (h') and the cost to the 
host (r). The colors correspond to the average Z frequency in the last 10^ of 10^ generations, u = 10"^, v = 10"^, and the rest of the parameters 
were chosen for a high value of z (0.733) in a stable equilibrium: s = 10"^ t = 10"^ h = 0.0123. 



endonuclease must not be higher than the toxicity of 
the insertion sequence per se. Indeed, HEases were 
shown to possess exquisite specificity. For example, the 
I-Scel HEase from the mitochondria of the budding 
yeast Sacharomyces cerevisiae (genome size ~8.5''10^ bp) 
has an extremely narrow and predictable range of clea- 
vage sites in the human genome [48] (genome size 
~3''10^ bp) and little if any toxicity in human cells and 
live mice [6,49]. Last, the importance and hence the 
conservation of intron and intein insertion sites, as pre- 
dicted by our model, have been repeatedly documented 
[20]. 

Finally, according to our model, certain HEGs might be 
stable in the short term but not in the long term, in the 
sense that particular mutants that might lead to extinction 
of the HEG can invade the population. In these cases the 
long term pattern of HEG evolution might be that of the 
homing cycle [21], but orders of magnitude slower. 

Possible applications for the predictions of the model 

It is interesting to assess our model not only from the 
perspective of evolutionary biology, but also in terms of 
the relevance of its predictions to gene therapy and 
genetic engineering. 

HEases may have very long target sites [3]. However, 
given the degeneracy in target recognition [50], the 



question remains whether many HEases, selected to 
cleave small genomes of microbes, mitochondria and 
chloroplasts would turn out to have a multitude of tar- 
gets in the human genome and therefore would be 
highly toxic. While most parasites (e.g. viruses) can bal- 
ance their toxicity with their ability to propagate and 
therefore could bear significant toxicity to their natural 
host, the same is not true for HEases. The Rock-Paper- 
Scissors model we put forward asserts that the toxicity 
of the HEase cannot be balanced by its ability to propa- 
gate but that in fact this toxicity is bound by the toxicity 
of the intron or intein (see analytical bound 2). We 
therefore expect the toxicity of the HEG to be extremely 
low in its natural host, possibly implying reduced toxi- 
city in human cells. 

Finally, our analysis of the evolutionary stability of 
HEG alleles in the face of X' invasions is of importance 
for future applications of HEases. We asserted that Z can 
persist only if the reduction in homing ability associated 
with the X' allele is accompanied by a substantial fitness 
cost. In other words, a HEase must base its target specifi- 
city on conserved base pairs that are crucial for the host. 
The HEase target specificity, being the range of nucleo- 
tide sequences it can cleave, is highly predictable based 
on knowledge of the degree of conservation of individual 
sites along the target [24,51-54]. Furthermore, Because 
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HEGs reside within introns or inteins found in conserved 
motifs, finding a HEase target in the human genome may 
often not be a random event but rather a result of evolu- 
tionary motif conservation from the HEG hosting 
microbe to humans [5]. 

Our model of HEG population dynamics delineates 
the under-recognized complexity of the evolution of 
selfish elements. A better understanding of these fasci- 
nating molecular parasites could pave the way for their 
wide application in gene therapy and bio-engineering. 

Conclusions 

We provide for the first time a plausible account for the 
evolutionary preservation of HEGs. Admittedly, our model 
is a simplification in that it assumes a large homogenous 
mixed population, non-overlapping generations, selection 
acting only at the haploid stage and more. However, even 
under these simplified conditions, we find solutions in 
which the active HEG can persist at high frequency over 
long periods of time. This is in agreement with the fact 
that HEGs are readily sampled from natural populations 
of varied organisms [3]. Furthermore, our analysis deline- 
ates the conditions allowing long term HEG persistence. 
In particular we find that long-lasting high frequencies of 
the Z allele are possible only if the population is character- 
ized by low rates of genetic exchange, if the insertion 
sequence is to some extent toxic, if the endonuclease 
activity is not significantly more toxic than the insertion 
sequence per se, and if the target site is important for the 
survival of the host. 

When the first indications for long-term persistence of 
functional HEGs in populations were uncovered, one 
explanation was to assume spatially distributed, inhomo- 
geneous populations, as was suggested for other biologi- 
cal non-transitive competition networks [32,55]. Here 
we have shown that, perhaps surprisingly, spatial inho- 
mogeneity is not a prerequisite for long-term persistence 
of HEGs at high penetrance. We determined conditions 
that are compatible with long-term HEG persistence, 
and we find that these conditions are met in populations 
that show a high frequency of HEGs. 
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